8/2/1989

		Information on the parallel version of the Rochester Simulator
		--------------------------------------------------------------
		
		
	The Linda source code is in 'Rochester 2:src:Linda'. Several files in 'Rochester 2:
src:uniproc' should be compiled with the '-d LINDA' flag if the parallel version is used.
The files are 'commoncm.c', 'conunicm.c', 'executio.c', and 'names.c'. Also the 'simain.c'
file from the 'uniproc' folder should be replaced with the 'simain.c' from the 'Linda'
folder. 

	Linda is a very elegant parallel programming model: by adding 6 commands to an existing
programming language, you have parallel programming capabilities. For a descripition of 
Linda please refer to David Gelernter's article "Getting the Job Done," of the November
1988 Byte magazine.

	The code has been written so that it can be compiled on the Macintosh with a
Linda-C simulator, or on the combined Macintosh - Chorus system. (If you have implemented
Linda on a parallel computer, you may have to make some minor changes; for instance,
the code assumes that it is possible to 'out' arrays into tuple space, which you may not
have implemented.) 

	The way it all works is the following: aside from the Macintosh there are several
processes running in parallel (for instance, in the example included there are 2 
processes). All these processes have their own private copies of the neural network
structure, name table, etc (the code is written for a distributed memory model, such as
the Chorus, as opposed to a shared memory model). The processes have infinite 'while'
loops, waiting for commands from the Macintosh. When the Mac initializes its data
structures at the beginning of the program, it 'outs' an initialize command into tuple
space which all processes read and execute. If the user types a command that changes
the network structure, the Mac outs a command for all processes to do the same thing.

	The most important part is the 'Step' function, the actual simulation. In synchronous
mode the Mac outs, aside from the command, a seed which tells the processes which unit
to start the simulation with. For example, if there are 4 processes, the Mac will out
seeds 0,1,2,3, and the four processes will start simulating units 0,1,2,3, respecively.
Then they will jump by 4 (the number of processes) to the next unit. So on the second
step they will simulate units 4,5,6,7, respectively. And so on. After all units have
been simulated, the array of outputs is updated. (Actually all processes simulate unit
0, since it is often used to control the simulation of the other units - see, for 
example, how the backprop package works). So the bigger the network, and the more
processors, the bigger the gain in speed over sequential processing.

	Because of the very nature of asynchronous simulation, it cannot be done in
parallel. The way this is done now is this: if the execution mode is asynchronous or
fair asynchronous, the Mac outs seeds as before, but only the process with seed #0 does
anything (I am assuming the processes will be running on something much faster than the
68020, something like a RISC chip). So everybody sits around and waits for this one
process to get done. When it is done, it sends the net values back to the Mac. However 
one could have 'pseudo-asynchronous' simulation, where a random subset of units is simulated 
synchronously. That could be implemented in parallel, but it has not been implemented yet.

	There is an EXAMPLE of a parallel simulator included in 'Rochester 1:Parallel example'.
It has been compiled and linked with our Linda-C simulator. It is a 180-16-8 backpropagation
network that has been trained to recognize 4 objects when presented with a histogram
of the relative strengths of various lines from angles 0 through 179. It is the demo
network that we had at the IJCNN '89 Washington conference.
To use it: 		1. run the simulator ('sim.parallel');
				2. type 'read log' to execute the commands in the log file;
				3. type 'call setinput bin <file>', where file is either disk10,
							ivory10, jerg10, or pdp10 (these are histograms of a Kodak disk,
							a bar of Ivory soap, a Jergens bottle of hand lotion, and the
							PDP book, all at a rotation of 10 degrees);
				4. type 'go 2' to simulate for 2 steps; each seed will print its seed
							number and the index of the unit it is simulating;
				5. type 'l u 198 - 205' to list the output units; the first four output
							units are disk, ivory, jerg, and pdp respectively;
				6. type 'pot 0 1'; this is because of the way the Linda simulator works,
							you cannot execute the go command twice in a row without
							doing a different command that affects the net structure 
							in between; if you look at the code you will see why. This
							restriction would not exist on a real parallel machine.
				7. repeat steps 3,4,5,6.
				
	If you have any questions about the implementation, Linda, the Linda-C simulator,
or the Chorus, please feel free to give us a call at (212) 925-1715.